Add key normalisation functionality #16

nikhilwoodruff · 2025-07-27T10:38:18Z

Implements key normalisation functionality to convert primary and foreign keys in related tables to zero-based sequential indices while preserving relationships.

When constructing microdata, we often work with datasets that have inconsistent key formats - some might use sequential IDs starting from 1, others might use large sparse integers like user IDs 101, 105, 103. This creates unnecessary complexity and memory overhead. By normalising all keys to a common zero-based sequential format, we can assume consistent key patterns across all datasets, simplifying downstream processing and reducing memory usage.

The implementation adds normalise_table_keys() for multi-table normalisation with relationship preservation and normalise_single_table_keys() for single table scenarios. Foreign key relationships are automatically detected based on column name matching, though explicit specification is also supported. The functionality handles edge cases like duplicate keys, missing columns, and invalid references with clear error messages.

Comprehensive test coverage includes 15 test cases covering normal operations, edge cases, and error handling. All existing tests continue to pass and code is formatted with black and isort.

Fixes #15

Implements normalise_table_keys and normalise_single_table_keys functions to convert primary and foreign keys to zero-based sequential indices while preserving relationships between tables. Features: - Auto-detection of foreign key relationships - Explicit foreign key specification support - Comprehensive error handling and validation - Full test coverage with 15 test cases - Documentation with examples and use cases Fixes #15

codecov · 2025-07-27T10:38:45Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

juaristi22

Very nice, very clean. Are we thinking of using this on person or household ids for example?

Small note: I saw you are missing the changelog entry, and I think the reason why that hasn't failed is because there exists a changelog entry in main already (the versioning failed on my code that was last merged). I was hoping to fix it with my open Dataset migration pr but I think its best if I quickly do it on main already, you might wanna add a a new entry for when thats fixed? Not blocking though (wouldn't be the end of the world if there is no entry when we merge this anyway)

src/policyengine_data/normalise_keys.py

tests/test_normalise_keys.py

into add-key-normalisation

nikhilwoodruff self-assigned this Jul 27, 2025

nikhilwoodruff requested a review from juaristi22 July 27, 2025 12:05

juaristi22 approved these changes Jul 27, 2025

View reviewed changes

src/policyengine_data/normalise_keys.py Show resolved Hide resolved

tests/test_normalise_keys.py Show resolved Hide resolved

juaristi22 added 2 commits August 7, 2025 11:01

Merge branch 'main' of https://github.com/PolicyEngine/policyengine-data

f866a84

into add-key-normalisation

add start_index to normalise_table_keys

5a837f9

juaristi22 merged commit 7a533dc into main Aug 7, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add key normalisation functionality #16

Add key normalisation functionality #16

Uh oh!

nikhilwoodruff commented Jul 27, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 27, 2025

Uh oh!

juaristi22 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add key normalisation functionality #16

Add key normalisation functionality #16

Uh oh!

Conversation

nikhilwoodruff commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 27, 2025

Welcome to Codecov 🎉

Uh oh!

juaristi22 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikhilwoodruff commented Jul 27, 2025 •

edited

Loading